Discovering Test Set Regularities in Relational Domains

نویسندگان

  • Seán Slattery
  • Tom M. Mitchell
چکیده

Machine learning typically involves discovering regularities in a training set, then applying these learned regularities to classify objects in a test set. In this paper we present an approach to discovering additional regularities in the test set, and show that in relational domains such test set regularities can be used to improve classification accuracy beyond that achieved using the training set alone. For example, we have previously shown how FOIL, a relational learner, can learn to classify Web pages by discovering training set regularities in the words occurring on target pages, and on other pages related by hyperlinks. Here we show how the classification accuracy of FOIL on this task can be improved by discovering additional regularities on the test set pages that must be classified. Our approach can be seen as an extension to Kleinberg’s Hubs and Authorities algorithm that analyzes hyperlink relations among Web pages. We present evidence that this new algorithm leads to better test set precision and recall on three binary Web classification tasks where the test set Web pages are taken from different Web sites than the training set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Domains Mediating Protein Interactions

Background: Protein-protein interactions do not provide any direct information re‌garding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting do‌main pairs. However they do not consider the in...

متن کامل

An Effective Algorithm for Discovering Fuzzy Rules in Relational Databases

In this paper, we present a novel technique, called F-APACS, for discovering fuzzy association rules in relational databases. Instead of dividing up quantitative attributes into fixed intervals and searching for rules expressed in terms of them, F-APACS employs linguistic terms to represent the revealed regularities and exceptions. The definitions of these linguistic terms are based on fuzzy se...

متن کامل

Discovering regularities from knowledge bases

Knowledge bases open new horizons for machine learning research. One challenge is to design learning programs to expand the knowledge base using the knowledge that is currently available. This paper addresses the problem of discovering regularities in large knowledge bases that contain many assertions in diierent domains. The paper begins with a deenition of regularities and gives the motivatio...

متن کامل

Inductive Logic Programming for Discovering Financial Regularities

The purpose of this work is discovering regularities in financial time series using Inductive Logic Programming (ILP) and related "Discovery" software system [Vityaev et al., 1992,1993] in data mining. Discovered regularities were used for forecasting the target variable, representing the relative difference in percent between today's closing price and the price five days ahead. We describe the...

متن کامل

Discovering Regularities in Databases Using Canonical Decomposition of Binary Relations

Regularities in databases are directly useful for knowledge discovery and data summarization. As a mathematical background, relational algebra helped for discovering the main data structures and existing dependencies between the different attributes in a relational database. Functional, difunctional and other kinds of dependencies in a relational database describe invariant regular structures t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000